-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor oci-copy to be more efficient #1130
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
scoheb
reviewed
Jul 8, 2024
scoheb
reviewed
Jul 8, 2024
ralphbean
force-pushed
the
efficient-copy
branch
2 times, most recently
from
July 8, 2024 13:24
04b5919
to
1478c32
Compare
scoheb
approved these changes
Jul 8, 2024
chmeliik
approved these changes
Jul 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution, if somewhat low-level
Just one nitpick for extracting the repo from the image ref (mainly for consistency with other tasks)
Originally, this task would download all artifacts requested in the input file, check them all, and then upload them all to the registry in one invocation of "oras push". This had two problems. First, if "oras push" flaked out part way through and the user needed to retry their pipeline, the entire download section would need to be run again needlessly. Second, for extremely large artifacts with lots of medium-sized files, an enormous PVC would be needed to hold all of them between download and push to the registry. The change here addresses both problems. First, files are downloaded, checked, pushed to the registry and then deleted from local storage - one at a time. This obviates the need for a large volume to store all files at once, since only enough storage is needed to store one file, not all of them. Second, as files are considered, first the registry is checked to see if the blob has already been pushed there. If it has, then skip the download step. This has the effect of greatly improving the runtime for artifacts where only one or two of many files have changed since the last taskrun.
Theoretically, this works if the IMAGE reference contains a port number. Co-authored-by: Adam Cmiel <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Originally, this task would download all artifacts requested in the input file, check them all, and then upload them all to the registry in one invocation of "oras push".
This had two problems. First, if "oras push" flaked out part way through and the user needed to retry their pipeline, the entire download section would need to be run again needlessly. Second, for extremely large artifacts with lots of medium-sized files, an enormous PVC would be needed to hold all of them between download and push to the registry.
The change here addresses both problems.
First, files are downloaded, checked, pushed to the registry and then deleted from local storage - one at a time. This obviates the need for a large volume to store all files at once, since only enough storage is needed to store one file, not all of them.
Second, as files are considered, first the registry is checked to see if the blob has already been pushed there. If it has, then skip the download step. This has the effect of greatly improving the runtime for artifacts where only one or two of many files have changed since the last taskrun.